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Abstract 

Background: Currently, association studies are analysed using statistical mixed models, with marker effects 
estimated by a linear transformation of genomic breeding values. The variances of marker effects are needed when 
performing the tests of association. However, approaches used to estimate the parameters rely on a prior variance 
or on a constant estimate of the additive variance. Alternatively, we propose a standardized test of association 
using the variance of each marker effect, which generally differ among each other. Random breeding values from 
a mixed model including fixed effects and a genomic covariance matrix are linearly transformed to estimate the 
marker effects. 

Results: The standardized test was neither conservative nor liberal with respect to type I error rate (false-positives), 
compared to a similar test using Predictor Error Variance, a method that was too conservative. Furthermore, 
genomic predictions are solved efficiently by the procedure, and the p-values are virtually identical to those 
calculated from tests for one marker effect at a time. Moreover, the standardized test reduces computing time 
and memory requirements. 

The following steps are used to locate genome segments displaying strong association. The marker with the 
highest - log(p-value) in each chromosome is selected, and the segment is expanded one Mb upstream and one 
Mb downstream of the marker. A genomic matrix is calculated using the information from those markers only, 
which is used as the variance-covariance of the segment effects in a model that also includes fixed effects and 
random genomic breeding values. The likelihood ratio is then calculated to test for the effect in every chromosome 
against a reduced model with fixed effects and genomic breeding values. In a case study with pigs, a significant 
segment from chromosome 6 explained 1 1% of total genetic variance. 

Conclusions: The standardized test of marker effects using their own variance helps in detecting specific genomic 
regions involved in the additive variance, and in reducing false positives. Moreover, genome scanning of candidate 
segments can be used in meta-analyses of genome-wide association studies, as it enables the detection of specific 
genome regions that affect an economically relevant trait when using multiple populations. 
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Background 

The availability of high density genotypes of single nu- 
cleotide polymorphism (SNP) markers for plants and 
livestock species, in conjunction with phenotypic data 
for complex traits, allows the calculation of: 1) estimates 
of genomic breeding values (GEBVs) [1,2] for genomic 
evaluation [3], and 2) estimates of the effects of genomic 
regions associated with the genetic variability in genome 
wide association studies (GWAS) [2,4,5]. 

There is an increasing number of GWAS data sets an- 
alyzed by mixed models and multiple testing procedures 
[6], after fitting all individual effects of genomic regions 
into the model [4]. The model may be difficult to fit 
when both, the number of individuals and SNP effects, 
are large. We propose to use a linear transformation of 
genomic breeding values to estimate the marker effects 
from a simpler equivalent mixed model, and then testing 
those effects using a standardized test statistic that em- 
ploys the variance (rather than prediction error variance) 
of the same effects. 

The method of genomic selection proposed by Meuwissen 
et al. [7] to estimate GEBVs starts by fitting the SNP effects 
to a given data set. Next is to estimate GEBV of any indi- 
vidual using its genotype (SNP), by adding across the entire 
genome those solutions corresponding to the individual's 
SNP. The mixed model employed conveys vectors of fixed 
effects, and random effects of markers or SNPs [g) as- 
sumed to be normally distributed with null mean and a co- 
variance matrix proportional to the identity matrix times 
the variance of SNP effects ^/ . Errors are assumed to 
be Gaussian, independent and identically distributed with 
null mean and covariance matrix /o^. An equivalent mixed 
model discussed by Garrick [8] and Stranden [9] is fitted 
after the linear transformation a=Z g where a is a random 
vector of breeding values, and Z an incidence matrix that 
relates elements in a to those in g. Each column of Z is 
associated with a given SNP and the elements are stan- 
dardized by functions of SNP allele frequencies and by 
the total number of SNP. It is worth noting that the 
same Z is used in our implementation of the model of 
Meuwissen et al. [7] to relate the vector of marker ef- 
fects in g to the data phenotypes. Moreover, GEBVs in 
the equivalent model have variance-covariance matrix 
G a\= ZZ! a^. The procedure requires that the vari- 
ances are equal, i.e. o\= . Once the equivalent 
model is fit, SNP effects are calculated by the transform- 
ation g= ZG a, and individual SNP effects in g are di- 
vided by the square root of its variance (Var(^y)) to get 
the so called SNP^y test statistics. We also provide a for- 
mula to calculate Var( gj ) without having to fit the 
model with SNP effects. The next step is to select gen- 
ome segments that may be highly associated with the 
genetic variability of the trait for each chromosome. In 
doing so, we look for the SNP having the highest value 



of minus the logarithm of the /7-value throughout the 
chromosome. Once the SNP is located, a segment of 
one Mb to the left and one to the right is defined, and a 
relationship matrix is calculated using only the informa- 
tion from those markers. The relationship matrix is 
used as the proportional variance-covariance of the seg- 
ment effects in a model that also includes fixed effects 
and random GEBVs. In a final step, the likelihood ratio 
is calculated to test the significance of the largest effect 
segment of each chromosome by comparing against a 
reduced model with fixed effects and GEBVs. The crit- 
ical value (size of the test) is adjusted by the Bonferroni 
correction. The algorithm not only delivers genome 
wide associations and genomic predictions efficiently, 
but it also minimizes computing time and memory re- 
quirements. Moreover, the specific variance of the SNP 
effects is used in calculating the test, thus taking into 
account the amount of information of any given marker. 
Instead, other testing approaches rely on a prior vari- 
ance or a constant estimate of the additive variance. 

Methods 

Dataset 

The experimental population was raised at the Michigan 
State University Swine Teaching and Research Farm, East 
Lansing, MI [10]. Parents from the initial generation (Fq) 
were four Duroc boars mated to 15 Pietrain sows by artifi- 
cial insemination. From all resulting Fi animals, 50 fe- 
males and 6 males (progeny of 3 Fq sires) were selected as 
parents for the F2 generation, by avoiding full or half sib 
matings. A total of 1,259 F2 piglets were born alive from 
142 litters out of 11 farrowing groups. Phenotypic data for 
growth, carcass merit and meat quality traits were col- 
lected for approximately 950 F2 pigs (for more details refer 
to Edwards et al. [10,11]). Data used for the study were 
measures of the growth trait 13 week tenth rib bacl^at 
(mm) (bfl0_13wk). The trait was chosen as it displays a 
sizable heritability (0.42) and a normal distribution. 
Animal protocols were approved by the Michigan State 
University All University Committee on Animal Use 
and Care (AUF# 09/03-114-00). 

Genotyping and data editing 

DNA was isolated from white blood cells using standard 
procedures as previously described for this population 
[10]. Quantity and quality of DNA samples were deter- 
mined using a Qubit fluorometer (Invitrogen by Life 
Technologies, Carlsbad, CA, USA). The experimental 
population was genotyped with two marker SNP panels. 
1) 411 animals were genotyped (4 Fq Duroc boars, 15 Fq 
Pietrain sows, 6 Fi males, 50 Fi females and 336 F2 pigs) 
with a commercial panel, the lUumina PorcineSNP60 
beadchip (60 K) [12] and 2) 612 F2 animals were geno- 
typed with a second panel composed of a 9 K tagSNP 
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set referred to as the GeneSeek Genomic Profiler for 
Porcine LD (GGP-Porcine, GeneSeek a Neogen Company, 
Lincoln, NE) [13] . A set of 5,350 SNP out of M = 62,163, 
were eliminated from all analyses as their physical posi- 
tions were unknown. Mendelian inconsistencies (<0.01%) 
were taken as missing genotypes, and 21 animals (1 Fi 
and 20 F2) with more than 10% of SNP missing were not 
used for any analysis. By similar considerations, 2,978 
SNP were removed from the analyses as they had more 
than 10% missing data. Additionally, 9,877 SNP were ex- 
cluded as their minor allele frequency (MAP) was below 

0. 01. This editing procedure followed that of Badlce et al. 
[14] and Gualdron et al. [15], and the program PLINKvl.07 
[16] was used for the task. F2 animals genotyped with the 
9 K panel were imputed to 60 K following procedures dis- 
cussed by Gualdron et. al [15], by means of the software 
Alphalmpute [17], resulting in imputation accuracy of 
around 0.99 [15]. Genotypes imputed in the F2 had a sec- 
ond editing procedure by MAP < 0.01, which excluded 759 
virtually monomorphic SNP. The editing policies and geno- 
type imputation resulted in a data set with records from 
1002 pigs (Fo, Fi and Fa) having 44,055 SNP per animal. 

Estimation of genomic relationship matrix 

The genomic relationship matrix was estimated from ob- 
served and imputed high density (-44 K) SNP genotypes. 
Genotypes were expressed as allelic dosage [13,15], such 
that genotypes were entered into a marker matrix M of di- 
mension (k X m), where n is the number of animals and m 
the number of SNP, having elements in the interval [0, 2], 

1. e. the count of the allele used as reference. In the sequel, 
we will use the sub index / to refer to the individual. 
Matrix M was standardized to matrix Z that has generic 
elements equal to 



y = X p 



(2a) 



m\2 Pj[l-Pj 



Elements of Z are then calculated by subtracting twice the 
frequency of the reference allele at the /th marker (p^, to 
the corresponding element of M [18], and then dividing the 
resulting difference by the square root of the expected vari- 
ance 2/7,(1 - pj) of each element in the column multiplied by 
the number of columns {m) in M. The allele frequency pj 
was calculated from the Fq generation (19 animals). The 
genomic relationship matrix was finally calculated as: 



G=ZZ' 



(1) 



Prediction model 

Using the genomic relationship matrix from equation (1), 
the centered animal model for genomic evaluation can be 
written as: 



where y is the phenotypic vector containing the data on 
13-week tenth rib backfat {mm), X is the incidence matrbc 
that relates records to the fixed effects of sex in (J, vector 
a contains the random breeding values such that a~N 
(0, G a\) , e is the random error vector such that e~N 
(0,1 a^) , and / is the identity matrbc. Variance compo- 
nents were estimated with REML using the regress version 
1.3-10 R package [19]. 

Following Stranden et al. [9] an equivalent model to 
(2a) is 



X p + Zg + e 



(2b) 



Every element in (2b) is defined as before except for 
the vector g of SNP effects. To show that (2a) and (2b) 
are equivalent models, we employ the fact that a = Z g. 
Then, the variances of a and g are related in the follow- 
ing manner: 

G al = Var(a) = Var {Z g) = Z Var {g) Z! 

= -I 

Necessary conditions for models (2a) and (2b) to be 
equivalent (Henderson, 1984) are that G = ZZ' and 



Variance of SNP effects 

In this section, we describe the algorithm to calculate the 
variance of the estimated SNP effects g (i.e. Var(J^)). The 
SNP effects were obtained from a linear transformation of 
breeding values in d [4,9,20,21], as follows: 

BLUP(i) =g= covig, a') [Var(fl)]-i a 
= cov(g, g') Z' G-i {aiy'a 

= Z ' G ' a = Z ' G ' a 

(3) 

The last step results from the fact that model equiva- 
lence involves a\ = a^. Now, from equation (3) Var(g) 
is obtained as follows: 

Var(i) = Var(Z 'G"^a) = Z 'G"iVar(«) G"^ Z 

(4) 

Now, we know that the predictor error variance (PEV) 
of a from model (2a) is equal to: 

PEV(a) = Var(a - d) = = Var(a)-Var(fl) 

So that 
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Var(fl) =Var(a)- C" = G ol- C" 

Matrix C"" results from inverting the coefficient 
matrix of the mixed model equations [22] such that: 

c'^= a^(i-x{x'xy^x'+G-^xy\x = 4 



Then, on replacing with the latter expression into Var 
(d) (displayed in (4)), we have: 



Var(g) = Z'G-^Gal- €' 
= Z' G-^Z al 



G"i Z 
Z' C"" Z 



(5) 



Expression (5) results in a large matrix of dimension 
(m X m) with m the number of SNP. However, we only 
need its diagonal elements. Also notice that the first 
term in (5), Z' G Z, can be computed and stored to 
be reused for the different traits, whereas C"" has to be 
computed for each trait. 

Standardization of SNP effects (SNPgy) 

The estimated SNP effects in (3) were standardized by div- 
iding with their corresponding Var (^g^^ obtained from (5) 
as follows: 



SNPej 



Si 



Var(gy) 



(6) 



P-values and genome screening 

The p-vahxes were assessed as 1 minus the cumulative 
probability density of the absolute value of SNP^j, a 
number that was then multiplied by 2 so as to obtain: 

value; = 2(1- 0{\SNPei\)) 

where 0{x) is the cumulative density function of the normal 
distribution for the random variable x When analyzing the 
trait 13 week tenth rib bacl<fat (mm), the j?-values for each 
SNP were plotted across the genome as -Logio (p-value) 
using the physical position of the SNP in Mega-bases (Mb). 

Standardization of SNP effects using the PEV of the marker 

A second standardization of the SNP effect (3) was 
performed using the PEV {g) as follows: 



SNP, 



Si 



epj 



Var(^g^.^-Var(|g^. 



(7) 



As discussed above, o^ = o^a- The /?- values and gen- 
ome screening for SNPep j were assessed and plotted in 
the same fashion as for SNP„ ;. 



Simulation 

A plasmode simulation was performed to compare how 
the standardized values SNPg j and SNP^p j affected the 
nominal size of the test for the effect to be equal to zero. 
Data on 928 animals with 44,055 SNP each were used 
for the study, and the 1018 SNP on chromosome 18 
were reshuffled. Two scenarios were considered: 1) De- 
pendency: rows of the genotype matrix were permuted 
for columns corresponding to SNP on chromosome 18, 
thus keeping Linkage Disequilibrium (LD) within chromo- 
somes but breaking the relationship between genotypes 
and phenotypes for the 1018 SNP on the chromosome. 2) 
Independency: the genotype of any animal was permuted 
independently by marker (resulting in linkage equilibrium, 
or LE between markers) for those SNP on chromosome 
18, and the relationship with the phenotype was broken 
too. For both scenarios model (2a) was fitted to the 
data, and two tests were calculated for each scenario: 
testl = SNPej and test2 = SNP^pj. Permutations were re- 
peated 200 times per scenario, and in each permutation 
the G matrix was calculated while fitting model (2a). As a 
result, the heritability of the trait was similar to the ori- 
ginal heritability due to relationships in the other 17 chro- 
mosomes being kept intact, and /^-values for those SNP 
(that are now non-associated) on chromosome 18 were 
obtained for the different tests. Under the null hypothesis 
and assuming independence (i.e., SNP are unlinked to the 
polymorphism controlling the trait), an approach that 
controls for type I error appropriately [23], the 1018 test 
/j-values follow a uniform distribution. Consequently, to 
estimate the empirical quantiles of the distribution for the 
null hypothesis, we used a uniform density U ~ (0, 1) to 
generate 200 replicated sets for the 1018 /j-values. 

SNP effects and tests obtained by a single marker model 

The SNP effects were tested on a one by one basis. The 
model approach used for testing purposes is better known 
as "efficient mixed-model association" (EMMA) [24]. The 
model included fixed effects of sex and one-marker-at-a- 
time; random variable was the animal effect with variance- 
covariance equal to the genomic relationship matrix using 
all markers, which was calculated as described before. The 
R package rrBLUP [25] was used for fitting the different 
models and for calculating the tests and j5-values. 

Proportion of variance explained by segments with large 
effect 

After the genome screen using model 2a, the SNP with 
the smallest jj-values were selected to form SNP segments. 
These segments were defined by taking all SNP within 
one Mb upstream and one Mb downstream of the SNP 
with smallest /j-value on each chromosome. The size of 
the segment was chosen using a criterion similar to the 
one employed by Hayes et al. [4]. The point of change in 
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the rate of decay in linkage disequilibrium in this popula- 
tion was about = 0.2 at 1 Mb (data not shown), which 
essentially would imply a minimal contribution to the 
additive variance from markers located beyond such dis- 
tance. Moreover, segment sizes about two Mb have been 
reported to be significant in association studies [20,26-28] . 
The proportion of variance associated with each segment 
was estimated by building a genomic relationship matrix 
Gi (as described in (1)) using all SNPs that belonged to 
the segment, whereas genomic relationship matrix G2 was 
built using all remaining SNPs. The model fitted can be 
represented as: 

y = X^ +ai+a2 + e (8) 

where Ui is the vector of additive random effects associ- 
ated with those SNP located in the segment, such that 
ai~N(o, Gi a\^^ , and fl2 is the vector of additive ran- 
dom effects associated with all SNPs except those in- 
volved with ai, such that a2~N(^0, G2o\^^ . Model (8) 
assesses the proportion of variance explained by the seg- 
ment of interest (local variance) from the genome vari- 
ance explained by all SNPs (global variance). The 
variances estimated in (8) were compared with those es- 
timates from model (2a). Hayes et al. [4] used a similar 
model to assess the segment variance. Applying either 
model (8), or the approach of Hayes et al. [4] gave simi- 
lar estimated variance components. In practice, the ad- 
vantage of fitting model (8) is that G2 is computed 
by subtracting from G the columns of Z related to 
the segment being tested. Let be a matrix having as 
columns those related to the segment being tested, then 
G2=G-ZsZj. On the contrary, in the model of Hayes 
et al. [4] Gis different from segment to segment. Add- 
itionally, the calculation of Gi and ZjZ^ is fast and in- 
volves only those SNPs located in the segment. 

To adjust the level of significance for multiple compari- 
sons, a Bonferroni Correction (BC) was performed. In this 
context, if the pig genome is -2800 Mb long and the aver- 
age size of the segment is 2 Mb, there are 1400 segments 
along the genome with corresponding multiple tests. Thus, 
for a = 0.05, the BC was equal to 0.05/1400 = 3.571429e °^ 
(adjusted a or critical value). Hence, in order to evaluate 
the significance of the segments, a second p-vahie for the 
Likelihood Ratio Test {p - valueLRx) was calculated to 
compare against BC. This p - valucLRx was assessed as 
1 minus the distribution function of a chi-square (x^) 
random variable with 0.5 degrees of freedom [29,30] as 
follows: 

^-valucLRT = l-n(LRT) 

where 0{x) is the distribution function of a random 
variable having the ^s density, and LRT is the 



Likelihood Ratio Test obtained by contrasting appropri- 
ate models. 

Results 

Genome screening 

The /j-values of the 44055 SNP were obtained as de- 
scribed in the Methods section. First, the /^-values for 

SNPgj, i.e. using Var {g^ , were plotted along the genome 

(Manhattan plot in Figure 1) to identify genomic positions 
that are associated with variation in 13-week tenth rib 
bacl(fat {mm). Large peaks (-Logio(/?-value) above 5 can 
be seen at chromosomes 6 and 3, suggesting noticeable 
genetic variation for the trait. On the other hand, /7-values 
for SNPepj (i.e. standardized with prediction error vari- 
ance) were very large, with a maximum - Logio( /?-value2) 
of 0.20. In essence, the pattern observed in Figure 2 is the 
result of dividing the non-standardized SNP effects by a 
constant. Specifically, the normalizing value was [Var 

(g;) - Var [g^ ], with Var {gj) = 2.6768. The use of the 

square root of the difference between those two values re- 
sulted in a practically constant denominator for the test- 
statistic that was equal to 2.66. Also, a look at Figure 2 
suggests signals at chromosomes 1, 12, 14, and 18, a fact 
that is not observed in Figure 1. However, this might be 
an artefact of the constant denominator that tends to 
overestimate the true variability for some SNP, thus result- 
ing in corresponding false positives across the genome. 

In order to study the type I error rate of the two pro- 
posed tests we performed a plasmode simulation [31]. A 
plasmode is a dataset created from real data where some 
of the truth is known. In brief, our plasmode is a simu- 
lation that uses reshuffling in a portion of the data as 
explained in the methods section. We performed a simu- 
lation assuming independent SNP, and another one 
keeping the dependency between SNP (LD structure) in- 
tact. Simulation results were plotted into a Quantil- 
quantil plot graph (Figure 3) using the number -Log 
(/?-value) for each case of standardization. First, the 
/j-values for testl (SNPej) obtained in the scenario under 
independent SNPs (scenario 2, LE) displayed an identical 
distribution of jj-values when obtained by the reference 
distribution U ~ (0, 1). In contrast, under dependency 
(scenario 1, LD) less extreme /j-values were observed, a 
fact that was not reflected in a uniform distribution. 
This is a well known fact in human genetic epidemi- 
ology [32], where the implementation of the Bonferroni 
correction of /i-values from associated SNP under the 
assumption of independence results in tests that are too 
conservative. On the other hand, for test 2 (SNPepj) even 
/"-values obtained for independent SNP (scenario 2, LE) 
displayed a distribution that was too conservative. Fur- 
thermore, the results from the dependent scenario (LD) 
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Figure 1 Manhattan Plot for trait 13-week tenth rib backfat (mm) by standardization SNPej. Genome screening for 44055 SNP using 
standardization Var (gj^- -logio (p-value) (y axis ) versus the absolute SNP position in IVlb (x axis ). The red line represents a genome-wide 
significance threshold (p < 1.1349X 10"^). Numbers from 1 to 18 represent the chromosome ID. 
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Figure 3 Quantil-quantil plot of the observed and expected -log(p-values) obtained by simulation. Reference distribution was an 
independent and uniform distribution U ~ (0, 1) for 1018 p-values simulated (blacl< dotted line). Testi (scenario]) = under dependent (LD) and 
standardization by Var(g) (blue dotted line). Testi (scenario2) = under independent (LE) and standardization by Var(g) (green dotted line). Test2 
(scenario2) = under independent (LE) and standardization by PEV (orange dotted line). Each scenario has 1018 p-values permuted 200 times. 
Bands represent confidence intervals of 95% (blue band = testi (scenario]), green band = test! (scenario2), pink band = test2(scenario2). 



were even more conservative than those from the inde- 
pendent scenario (results not displayed in the Q-Q plot), 
thus indicating that the use of the square root of Var 
(^gj^ as the denominator of the test-statistic results in a 
more powerful and not too liberal choice when compared 
to the use of the square root of PEV = Var (g^^ -Var (g^ . 

SNP effects and tests obtained by the marker model 

The analyses of one SNP tested at a time using the EMMA 
procedure [24] resulted in /^-values that were almost identi- 
cal (Additional file 1) to those of SNP.j (Additional file 2). 
The time taken to compute 44055 SNP tests one at a time 
was 84 minutes. In comparison, the algorithm used to fit 
model (2a) and to perform the tests of standardized effects 
took a total time of 29 minutes (CPU and memory: Quad- 
core 2.7GHz AMD Opteron 8384, 256 GB). This time 
includes the computation of the G matrix, the fit of the 
animal model, the back transformation to calculate the 
SNP effects, and the calculation of the standard errors that 
are needed to compute the test-statistics. 

Tests of segment effects 

We also compared the results from our proposed method 
to those obtained with a segment-based likelihood ratio 
test that has been used by animal breeders [4]. Due to 
computational demand, we only performed the LRT to 



test for segment effects. Thus, the SNP with the smallest 
p-wahxes (or highest - Logio(/?-values)) on each chromo- 
some were chosen, whereas no segments were tested 
using LRT for regions with SNPepj resulting in exceedingly 
low /^-values. The three segments from chromosomes with 
the smallest j3-values are displayed in Table 1, and the 
remaining segments from the 15 other chromosomes are 
shown in the additional files (Additional file 3). All seg- 
ments measured 2 Mb (1 Mb on each side of the SNP 
with the smallest /7-value). The estimates of the variance 
components and the LogLikelihood obtained from model 
equation (8) were compared with those from model equa- 
tion (2a). These results are displayed in Table 2. 

Results from the LRT indicated that the segment 
on chromosome 6 was significant: p - valueLRx - 6 = 
1.133459e" , a number smaller than the critical 0.05 
Bonferroni threshold for 1400 segments (^critical = 0.05/ 
1400 = 3.571429e °^). On the contrary, the segments lo- 
cated on all other chromosomes were not significant. 
The proportion of variance explained by the segment 
from chromosome 6 (-Log(/7-value) = 8.02) was 11% of 
the total variance, a fact that was reflected in a similar 
reduction of the estimated additive variance (o^) in 
model (8): 1.952 -h 0.698 = 2.650. This latter value is 
close to 2.678, i.e. the estimated value of 0^ from model 
(2a) (see Table 2). For all other chromosomal segments, 
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Table 1 SNP selected by smallest p-value per chromosome 

SNP-name Chromosome Position Mb -log,o(p-value) \g\ 

ALGA01 04402 6 136.08 8.02 0.77 

H3GA0010564 3 119.34 5.95 0.48 

ALGA0032063 5 61.37 3.78 0.42 

ALGA0081287 14 125.98 3.28 0.33 

DRGA0011971 13 10.47 3.12 0.36 

MARC0022304 9 94.99 3.12 0.42 

ALG AO 106422 16 111.82 2.90 0.28 

ASGA0010464 2 62.15 2.79 0.30 

ALGAOl 11088 8 88.01 2.77 0.48 

ASGA0078865 18 10.72 2.70 0.49 

ALGA0010607 1 302.88 2.69 0.43 

MARC0082230 12 6.14 259 0.31 

ALGA0045724 7 12947 257 041 

ASGA0092331 4 138.29 252 0.27 

ASGA0070227 15 111.82 2.48 0.29 

ASGA0077393 17 55.27 2.43 0.32 

ASGA0045992 10 7.00 2.42 0.30 

ALGA0060793 11 10.50 2.38 0.34 

SNP name = SNP marker name. Position Mb = Marl<er physical position in 
Mega-Bases, -logio(p-value) = -Logarithm In base 10 of the smallest p-value, 
1^1 = absolute value of the SNP effect estimated for the trait 13 week tenth rib 
backfat (mm}. 

the estimated value of o\ did not decrease to a signifi- 
cant amount. 

Discussion 

The main goal of this research was to develop a novel 
procedure to perform a rapid genome scan, or GWAS 
analysis, from a genomic evaluation. Moreover, the suffi- 
cient statistics of our methodology are: the Best Linear 
Unbiased Prediction (BLUP) of the breeding values from 
an animal model, G as the covariance matrix (or H for a 
single step evaluation [33]), Z as the standardized 
marker effects matrix, variance components, and C"". 
This setting makes the implementation extremely feas- 
ible after the genomic evaluation has been performed as 
discussed by Legarra et al. [33]. 

Variance of the SNP effect 

First, the SNP effects gj were calculated by a linear trans- 
formation of a using expression (3). Then, we calculated 
Var^J^y^ using an expression derived from mixed model 
theory (see (4-5)). Next, we divided gj by the square root 
of Var^^y^ to standardize the effect, and referred the 

statistics as SNPej. The /7-values for the tests of specific 
genome regions were calculated with a level of signifi- 
cance - Logio(/?- value) = 5. Additionally, Prediction Error 



Table 2 Variance components and LogLikelihood for 
models with or without the segment 



Seg-chromosome 


6 


3 


5 


SNP - logio(p-value) 


8.02 


5.94 


3.78 


Lk_m1 


-1227.938 


-1227.938 


-1227.938 


Lk_m2 


-1210.800 


-1223.178 


-1224.540 


LRT 


34.28 


9.52 


6.80 


p-valueLRT 


1.1 X lO"' 


65x 10"" 


3.1 xlO"^ 


VarE_m1 


3.70 


3.70 


3.70 


VarA.ml 


2.68 


2.68 


2.68 


VarE_m2 


3.73 


3.67 


3.69 


VarA_m2 


1.95 


2.42 


2.55 


segmVA 


0.70 


0.63 


0.15 


%segmVA 


0.11 


0.09 


0.02 



Seg-chromosome = Number of chromosome where segment is located, 
ml = model{2a) without the segment: y = Xp + a + e, ml = model (8) with the 
segment y = Xp +a, +02 + e, SNP - logno(p-value) = -Logarithm in base 10 
of the SNP p-value selected to create a segment, Lk_m1 = -LogLikelihood for 
ml, Lk_m2 = -LogLikelihood for m2, LRT = Likelihood Ratio Test for ml and 
m2, p-valueLRT = p-value for LRT, VarE_m1 = Error variance (o^) of ml, 
VarA_m1 = Additive variance (oj) of ml , VarE_m2 = Error variance (o^) of 
m2, VarA_m2 = Additive variance (o^) of m2, segmVA = Additive variance 
segment ^o^^ ^ of m2, %segmVa = Proportion in% of the total variance 
explained by the segment. 

Variance (PEV = Var (g^^ -Var (^g^^ ) was employed for a 

second standardization, and it was called the SNP^pj statis- 
tic. After the analyses, we obtained higher p-vslues (max- 
imum - Logio(p-value) = 0.20) and detected stronger 
signals (higher peaks in the Manhattan plot) for SNP^pj 
than with SNP^j. Furthermore, a simulation was carried 
out with the same structure of SNPs markers and animal 
data as in the current study, in order to compare the per- 
formance of empirical p-vahxes of both standardized tests. 
The SNPs markers of chromosome 18 were reshuffled, 
and two scenarios were simulated: 1) Dependent geno- 
types (LD), and 2) Independent genotypes (LE). Neither 
scenario displayed a relationship with the phenotype, 
whereas both standardized tests were calculated at each 
scenario. The reference distribution for the /7-values 
considered was the uniform. In the independent sce- 
nario (LE), standardization with Var^g^^ gave an empir- 
ical distribution of /"-values that resembled the uniform 
density, but in the dependent scenario (LD) the SNP^j 
performed conservatively. Instead, the standardization 

with I^Var ^gy^ -Var (^gy^ j produced conservative results 

in the independent scenario (LE), and very conservative 
tests in the dependent scenario (LD). In this context, 

standardizing SNP effects with Var {g^ resulted in p- 

values that were closer to the simulated ones. Moreover, 
the performance of SNP^j under LD was not too conser- 
vative, a scenario that could be extrapolated to the 



ALGAOl 04402 


6 


136.08 


8.02 


H3GA001 0564 
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1 1 9.34 


5.95 


ALGA0032063 


5 


61.37 


3.78 


ALGA0081287 


14 


125.98 


3.28 


DRGAO0 11971 


13 


10.47 


3.12 


MARC0022304 


9 


94.99 


3.12 


ALGAOl 06422 


16 


1 1 1 .82 


2.90 


ASGAOO 10464 


2 


62.15 


2.79 


ALGAOl 11088 


8 


88.01 


2.77 


AJ^JAUU/ good 


1 8 


1 0 72 


2 70 


ALG AO0 10607 
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302.88 


2.69 
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6.14 


259 


ALGA0045724 
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1 2947 


257 


ASGA0092331 
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252 


ASGA0070227 
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1 1 1 .82 


248 
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17 


55.27 


243 


ASGA0045992 


10 


7.00 


242 


ALGA0060793 


11 


10.50 
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genotypes in the current study. In addition, the /^-values 
calculated using the EMMA procedure [24] were similar 
to those obtained with SAff'ey. These results suggest that 
SNPi,j behaves reasonably to control type I error rate or 
false positives. Also, the computing time for fitting 
model (2a) and then calculating (6) using expressions 
(3)-(5) was 2.5 to 3 times less than the computing time 
for the EMMA model. 

In order to identify SNP with important phenotypic 
associations [34], the calculation of SNP effects gj from 
genomic breeding values a [8,9,34] has been used in sev- 
eral studies [5,20,21]. In this context, the variance of 
SNP effects has been estimated using different ap- 
proaches. Wang et al. [21] employed the classical defin- 
ition of the variance of additive effects from quantitative 
genetics [35], so that the variance for each jth marker 

was obtained as follows: a\ j = g^j2pj(l-p^ . Whereas, 
McClure et al. [20] proposed equating the variance of 

SNP effects to ^2 ^3';^/) ^A' ^^'^ then normalizing 
the SNP effects with the square root of this estimated 
and constant variance. This test performed similar to 

SNPep J (7), when the estimated SNP effects (j;^ was di- 
vided by a constant denominator, a value almost equal 
to the prior variance 2.67, and resulted in a very conser- 
vative test. 

In contrast, the advantage of the standardized test 
{SNPej) presented here was that each SNP effect was 
scaled by its own (and different) standard deviation ra- 
ther than the use of a prior variance [20] or by the 
square of each specific SNP effect g^j [21] as variance. 
Furthermore, the computation of SNP^j, involves the 
same variance for the same SNPs markers and animals, 
i.e. = a\, and the use of the standardized incidence 
matrix Z, a function of 2pj{l - pj), takes into account this 
latter quantity into SNP^j. Additionally, the matrix Z 
uses the allele frequencies from the Fq generation calcu- 
lated with unrelated individuals, and a proper expected 
variance by marker (see Methods section). In addition, 
the test statistics 5A/Pey that standardizes SNP effects 
produces a /j-value, a result that is appealing to many re- 
searchers that are more familiar with the method of test- 
ing one SNP at the time rather than with the proportion 
of additive variance that is explained by a genomic re- 
gion. A further advantage of the method is that detec- 
tion of many false positives are avoided, and genome 
positions with sizeable effects are highlighted. 

Candidate segment approach 

Later in the research, genome segments that expressed 
higher signals were located. To this purpose, SNPs with 
the smallest /j-values from SNP^j (6) were selected, and 



for each of these SNP a segment of 2 Mb long (1 Mb at 
each side) was created. The next step was to estimate 
the variance components and the Log-Likelihood from 
the centered animal models (2a) and (8). The latter 
model includes the random vector of SNP segments a^. 
Lastly, we compare the performance of both models. 
Hayes et al. [4] used a similar model to (8), although the 
random SNP effect was taken from the breeding value 
and fitted as a separate segment effect. We observed 
similar results from the use of either approach. The ad- 
vantage of fitting model (8) is that matrix G is the same 
for all segments, so that it was calculated only once, and 
stored in memory for the calculations, whereas in the 
model of Hayes et al. [13] a different G has to be calcu- 
lated for each segment. This implies an extended com- 
puting time and higher requirements of CPU memory to 
obtain similar results to those from model (8). 

To evaluate the significance of the segments, the ef- 
fects of each chromosome segment were tested by the 
Likelihood Ratio Test. The size of the test was adjusted 
by the Bonferroni correction. As a result, the segment 
located on chromosome 6 (physical position 135 Mb- 
137 Mb) was significant, and explained 11% of the trait 
total variance. Previous studies by Edwards et al. [10] 
and Choi et al. [36], using microsatellites and a small 
number of SNP, found significant regions (physical posi- 
tions between 135 and 139 Mb) on chromosome 6 for 
13 week tenth rib backfat in the current population 
under study. 

Additionally, forty eight markers between the physical 
position between 128 Mb and 139 Mb on chromosome 
6 (http://www.animalgenome.org/QTLdb/pig.html), have 
been reported to be associated with the trait. Further- 
more, recent studies showed the importance of chromo- 
some 6 [37,38] in the expression of the trait. Therefore, 
our results confirm the presence of genetic variability in 
the trait from chromosome 6. 

Conclusions 

Fast genome screening of SNP effects linearly transformed 
from genomic breeding values is advantageous, as a by- 
product of genomic evaluations for different species of 
farm animals. Moreover, the standardized tests of SNP 

effects using their own variance ^Var^J^y^^ developed in 

this study helps in detecting specific genomic regions in- 
volved in the additive variation of the trait and reducing 
false positive locations using less computing time. Add- 
itionally, genome segments of about 2 Mb formed by sur- 
rounding the SNP with the smallest /^-values on each 
chromosome, and tested with a standardized test involv- 
ing Var^g-J and with the Bonferroni correction, could de- 
tect genome regions responsible for sizeable fractions of 
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the trait genetic variance. This methodology involving 
genome scan and candidate segment approach is a useful 
method for meta-analyses of genome-wide association 
studies, as it enables the detection of specific genome re- 
gions that affect an economically relevant trait when using 
multiple populations. Code and data to obtain and repro- 
duce the results presented is publicly available at https:// 
www.msu.edu/~steibelj/JP_files/GBLUP.html. 
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